UB-H: an unbalanced-hierarchical layer binary-wise construction method for high-dimensional data
نویسندگان
چکیده
Abstract Cloud computing, which is distributed, stored and managed, drawing attention as data generation storage volumes increase. In addition, research on green increases energy efficiency, also widely studied. An index constructed to retrieve huge dataset efficiently, the layer-based indexing methods are used for efficient query processing. These construct a list of layers, so that only one layer required information retrieval instead entire dataset. The existing layers using convex hull algorithm. However, execution time this method very high, especially in large, high-dimensional datasets. Furthermore, if total number increases, processing resulting efficient, but slow, paper, we propose an unbalanced-hierarchical method, hierarchically divides dimensions input increase reduce building time. We demonstrate proposed procedure significantly reduces time, compared through various experiments.
منابع مشابه
Fast Binary Embedding for High-Dimensional Data
Binary embedding of high-dimensional data requires long codes to preserve the discriminative power of the input space. Traditional binary coding methods often suffer from very high computation and storage costs in such a scenario. To address this problem, we propose two solutions which improve over existing approaches. The first method, Bilinear Binary Embedding (BBE), converts highdimensional ...
متن کاملHierarchical Binary Histograms for Summarizing Multi-Dimensional Data
The need to compress data into synopses of summarized information often arises in many application scenarios, where the aim is to retrieve aggregate data efficiently, possibly trading off the computational efficiency with the accuracy of the estimation. A widely used approach for summarizing multi-dimensional data is the histogram-based representation scheme, which consists in partitioning the ...
متن کاملMethods for regression analysis in high-dimensional data
By evolving science, knowledge and technology, new and precise methods for measuring, collecting and recording information have been innovated, which have resulted in the appearance and development of high-dimensional data. The high-dimensional data set, i.e., a data set in which the number of explanatory variables is much larger than the number of observations, cannot be easily analyzed by ...
متن کاملPCS: An Efficient Clustering Method for High-Dimensional Data
Clustering algorithms play an important role in data analysis and information retrieval. How to obtain a clustering for a large set of highdimensional data suitable for database applications remains a challenge. We devise in this paper a set-theoretic clustering method called PCS (Pairwise Consensus Scheme) for high-dimensional data. Given a large set of d-dimensional data, PCS first constructs...
متن کاملAn $\ell_1$-Method for Clustering High-Dimensional Data
In general, the clustering problem is NP–hard, and global optimality cannot be established for non–trivial instances. For high–dimensional data, distance–based methods for clustering or classification face an additional difficulty, the unreliability of distances in very high–dimensional spaces. We propose a distance–based iterative method for clustering data in very high–dimensional space, usin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Computing
سال: 2021
ISSN: ['0010-485X', '1436-5057']
DOI: https://doi.org/10.1007/s00607-020-00871-0